MiBio: A dataset for OCR post-processing evaluation
نویسندگان
چکیده
منابع مشابه
OCR Post-Processing for Low Density Languages
We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted finite state machines. We evaluate the technique in a number of scenarios relevant for natural language processing, including creation of new OCR capabilities for low density languages, improvement of OCR performance for a native commercial system, acquisition of knowledge from a...
متن کاملStochastic Error-Correcting Parsing for OCR Post-Processing
In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to post-process the results of an optical character recognizer (OCR). Deterministic and non-deterministic approaches are possible under the proposed setting. The basic units of the model can be words or complete sentences, and the lexicons or the language databases can be simple enumerations or may ...
متن کاملArabic Optical Character Recognition (OCR) Evaluation in Order to Develop a Post-OCR Module
متن کامل
A post-processor for Gurmukhi OCR
A post-processing system for OCR of Gurmukhi script has been developed. Statistical information of Punjabi language syllable combinations, corpora look-up and certain heuristics based on Punjabi grammar rules have been combined to design the post-processor. An improvement of 3% in recognition rate, from 94.35% to 97.34%, has been reported on clean images using the post-processing techniques.
متن کاملEfficient OCR Post-Processing Combining Language, Hypothesis and Error Models
In this paper, an OCR post-processing method that combines a language model, OCR hypothesis information and an error model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic Error-Correcting Language Modeling. We use Weighted Finite-State Transducers (WFSTs) to represent the language model, the complete set of OCR hypotheses interpreted as a sequence of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data in Brief
سال: 2018
ISSN: 2352-3409
DOI: 10.1016/j.dib.2018.08.099